Data Engineering for Scaling Language Models to 128K Context